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Abstract 

We establish tight results for rapid mixing of Gibbs Samplers for the Ferromagnetic Ising model on 
general graphs. We show that if 

(d- l)tanh/3 < 1, 

then there exists a constant C such that the discrete time mixing time of Gibbs Samplers for the Ferromagnetic 
Ising model on any graph of n vertices and maximal degree d, where all interactions are bounded by /3, and 
arbitrary external fields is bounded by Cn log n. Moreover, the spectral gap is uniformly bounded away from 
for all such graphs as well as for infinite graphs of maximal degree d. 

We further show the when d tanh/3 < 1, with high probability over the Erdos-Renyi random graph 
G(n, d/n), it holds that the mixing time of Gibbs Samplers is 

1 + G(t i ) 

Jl >> log log Tl ' _ 

Both results are tight as it is known that the mixing time for random regular and Erdos-Renyi random graphs 
is, with high probability, exponential in n when (d — 1) tanh/3 > 1, and d tanh/3 > 1, respectively. To 
our knowledge our results give the first tight sufficient conditions for rapid mixing of spin systems on general 
graphs. Moreover, our results are the first rigorous results establishing exact thresholds for dynamics on 
random graphs in terms of spatial thresholds on trees. 
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1 Introduction 



Gibbs Sampling is a standard model in statistical physics for the temporal evolution of spin systems as well 
as a popular technique for sampling high dimensional distributions. The study of the convergence rate of 
Gibbs Samplers has thus attracted attention much attention from both statistical physics and theoretical com- 
puter science. Traditionally such systems where studied on lattices. However the applications in computer 
science coupled with the interest in diluted spin-glasses in theoretical physics led to an extensive exploration 
of properties of Gibbs sampling on general graphs of bounded degrees. 

Below we will recall various definitions for measuring the convergence rate of the dynamics in spectral 
and total variation forms. In particular, we will used the notion of rapid mixing to indicate convergence in 
polynomial time in the size of the underlying graph. 

A feature of most sufficient conditions for rapid convergence is that they either apply to general graphs, 
but are not (known to be) tight, or the results are known to be tight but apply only to special families of 
graphs, like 2-dimensional grids, or trees. Examples of results of the first type include the Dobrushin and the 
Dobrushin Shlosman conditions [6| and results by Vigoda and collaborators on colorings, see e.g. 11281 l29l l9l . 
Examples of tight results for special graphs include the Ising model on 2-dimensional grids by Martinelli and 
Oliveri [18] ED, see also E3 and the Isin § model on ttees fi2ll2l l20lll7l . 

In this paper we consider Gibbs sampling for the Ferromagnetic Ising model on general graphs and provide 
a criteria in terms of the maximal coupling constant (3 and the maximal degree d which guarantees rapid 
convergence for any graph and any external fields. The criteria is [d — 1) tanh /3 < 1. We further establish that 
if d tanh (3 < 1, then rapid mixing holds, with high probability, on the Erdos-Renyi random graph of average 
degree d, thus proving the main conjecture of 112311241 . Both results are tight as random d-regular graphs and 
Erdos-Renyi random graph of average degree d with no external fields, have, with high probability, mixing 
times that are exponential in the size of the graph when (d — 1) tanh/3 > 1 (resp. d tanh (3 > 1) |8]|S]. To 
our knowledge our results are the first tight sufficient conditions for rapid mixing of spin systems on general 
graphs. 

Our results are intimately related to the spatial mixing properties of the Gibbs measure, particularly on 
trees. A model has the uniqueness property (roughly speaking) if the marginal spin at a vertex is not affected 
by conditioning the spins of sets of distant vertices as the distance goes to infinity. On the infinite d-regular 
tree, uniqueness of the Ferromagnetic Ising model holds when (d — 1) tanh/3 < 1 |fl~5l. corresponding to 
the region of rapid mixing. It is known from the work of Weitz [ 30 1 that in fact spatial mixing occurs when 
(d — 1) tanh f3 < 1 on any graph of maximum degree d. 

It is widely believed that (some form of) spatial mixing implies fast mixing of the Gibbs sampler. How- 
ever, this is only known for amenable graphs and for a strong form of spatial mixing called "strong spatial 
mixing" Q. While lattices are amenable, there are many ensembles of graphs which are non-amenable such 
as expander graphs. In fact since most graphs of bounded degree are expanders, the strong spatial mixing tech- 
nique does not apply to them. Our results apply to completely general graphs and in particular various families 
of random graphs whose neighbourhoods have exponential growth. 

Our results also immediately give lower bounds on the spectral gap of the continuous time Glauber dy- 
namics which are independent of the size of the graph. This in turn allows us to establish lower bound on the 
spectral gap for the Glauber dynamics on infinite graphs of maximal degree bounded by d as well. 

To understand our result related to the Erdos-Renyi random graph, we note that the threshold for the Erdos- 
Renyi random graphs also corresponds to a spatial mixing threshold. For a randomly chosen vertex the local 
graph neighbourhood is asymptotically distributed as a Galton- Watson branching process with offspring distri- 
bution Poisson with mean d. Results of Lyons [ 15 1 imply that the uniqueness threshold on the Galton- Watson 
tree is <itanh j3 < 1 which is equal to the threshold for rapid mixing established here. 
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The correspondence between spatial and temporal mixing is believed to hold for many other important 
models. We conjecture that when there is uniqueness on the d-regular tree for the antiferromagnetic Ising 
model or the hardcore model then there is rapid mixing of the Gibbs sampler on all graphs of maximum 
degree d in these models. It is known that for both these models that the mixing time on almost all random 
d-regular bipartite graphs is exponential in n the size of the graph beyond the uniqueness threshold ll25l [81151. 
so our conjecture is that uniqueness on the tree exactly corresponds to rapid mixing of the Gibbs sampler. We 
summarize our main contributions as follows: 

• Our results are the first results providing tight criteria for rapid mixing of Gibbs samplers on general 
graphs. 

• Our results show that the threshold is given by a corresponding threshold for a tree model, in particular 
in the case of random graphs and dilute mean field models. We note that in the theory of spin-glasses 
it is conjectured that for many spin systems on random diluted (bounded average degree) graphs the 
"dynamical threshold" for rapid mixing, is given by a corresponding "replica" threshold, i.e., a spatial 
threshold for a corresponding spin system on trees (see for example ||2T1 [T3l 22 1). To the best of our 
knowledge our results are the first to rigorously establish such thresholds. 

While the proof we present here is short and elegant - it is fundamentally different than previous approaches 
in the area. In particular: 

• It is known that imitating the block dynamics technique [18. 1 9 1 cannot be extended to the non-amenable 
setting since the bounds rely crucially on the small boundary to volume ratio which can no be extended 
to expander graphs, see a more detailed discussion in (7). 

• Weitz 1 30 1 noted that the tree of self avoiding walks construction establishes mixing results on amenable 
graphs but not for non-amenable graphs. In general, correlation inequalities/spatial mixing have previ- 
ously only been shown to to imply rapid mixing on amenable graphs, an excellent reference of this is the 
thesis of Weitz fl3~Tl . 

• The technique of censoring the dynamics is another recent development in the analysis of Gibbs samplers 
ll30ll and can for instance be used to translate results on the block dynamics to those on the single site 
dynamics. Its standard application does not, however, yield new results for non-amenable graphs. 

• While tight results have been established in the case of trees lfl2l [2] [20l [17) which are non-amenable, 
the methods do not generalize to more general graphs as they make fundamental use of properties of the 
tree, in particular the presence of leaves at the base. Indeed, the fact that the median degree of a tree is 1 
illustrates the difference between trees and regular graphs. 

The main novelty in our approach is a new application of the censoring technique. In the standard use of cen- 
soring a censored Markov chain is constructed which is shown to mix rapidly and then the censoring inequality 
implies rapid mixing of the original dynamics. Our approach is a subtle conceptual shift. Rather than construct 
a censoring scheme which converges to the stationary distribution we construct a sequence of censored dy- 
namics which do not converge to stationarity. They do, however, allow us to establish a sequence of recursive 
bounds from which we derive our estimates of the spectral gap and the mixing time. 

Another serious technical challenge of the paper was determining the correct mixing time for the Gibbs 
sampler on Erdos-Renyi random graphs. The necessary estimate is to bound the mixing time on the local 
neighbourhoods of the graph which are Galton- Watson branching processes with Poisson offspring distribution. 
This is done via an involved distributional recursive analysis of the cutwidth of these branching process trees. 

In the following subsections we state our results, then recall the definition of the Ising model, Gibbs Sam- 
pling and Erdos-Renyi random graphs followed by a statement of a general theorem from which both of our 
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main results follow. We then sketch the main steps of the proof followed by detailed proofs. We then show 
how our spectral gap bounds on finite graphs can be extended to infinite graphs. Finally we conclude with open 
problems involving other systems. 

1.1 Our Results 

In our main result we establish the following tight criteria for rapid mixing of Gibbs sampling for general 
graphs in terms of the maximal degree. 

Theorem 1 For any integer d > 2, and inverse temperature fi > 0, such that 

(d- l)tanh/3 < 1, (1) 

there exist constants < A*(C, fi), C(d, fi) < oo, such that on any graph of maximum degree d on n vertices, 
the discrete time mixing time of the Gibbs sampler for the ferromagnetic Ising model with all edge interactions 
bounded by fi, and arbitrary external fields, is bounded above by Cn log n. 

Further the continuous time spectral gap of the dynamics is bounded below by A*. The spectral gap bound 
applies also for infinite graphs. 

We note that a lower bound of f2(n log n) on the mixing time follows from the general results of 1101 . 

The techniques we develop here also allow us to derive results for graphs with unbounded degrees. Of 
particular interest is the following tight result: 

Theorem 2 Let fi > and d > and consider the Erdos-Renyi random graph G on n vertices where each 
edge is present independently with probability d/n. Then for all fi such that (itanh fi < 1, there exists c(d, fi) 
and C [d, fi), such that with high probability over G, the discrete time mixing time T m i x of the Gibbs sampler 
for the ferromagnetic Ising model with all edge interactions bounded by fi and arbitrary external field satisfies 

J-J^+loglog,.) < Tmix < n ( 1+ log£g„) 

while the continuous time while spectral gap satisfies 

c c 

n log log n > Gap > n lo s lo s ™ . 

Both results are tight as estimates obtained in (8][5] following [25 ] and proving a conjecture from ||23ll24l 
imply that for the Ising model without external fields, the mixing time of the Gibbs sampler is with high 
probability exp(fi(n)) on random d-regular graphs if (d — 1) tanh fi > 1 and Erdos-Renyi random graphs of 
average degree d when dtanh fi > 1. 

1.2 Standard Background 

In the following subsection we recall some standard background on the Ising Model, Gibbs Sampling and 
Erdos-Renyi Random Graphs. 

1.2.1 The Ising Model 

The Ising model is perhaps the oldest and simplest discrete spin system defined on graphs. This model defines 
a distribution on labelings of the vertices of the graph by + and — . 
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Definition 1 The (homogeneous) Ising model on a graph G with inverse temperature j3 is a distribution on 
configurations {±} v such that 

P{<7) = ziB) eMl3 ^ v{v)v{u)) (2) 
{v,u}eE 

where Z(f3) is a normalizing constant. 

More generally, we will be interested in the more general Ising models defined by: 

P(a) = ^exp(H(a)), (3) 

where the Hamiltonian H(a) is defined as 

H(a)= ^2 Pu,vcr{v)<j{u) + ^ h v a(v) 

{v,u}€E v 

and where h v are arbitrary and j3 UjV > for all u and v. In the more general case we will write j3 = 

T[l<XX. u v Pu ; v 



1.2.2 Gibbs Sampling 



The Gibbs sampler (also Glauber dynamics or heat bath) is a Markov chain on configurations where a configu- 
ration cr is updated by choosing a vertex v uniformly at random and assigning it a spin according to the Gibbs 
distribution conditional on the spins on G — {v}. 



Definition 2 Given a graph G = (V, E) and an inverse temperature (3, the Gibbs sampler is the discrete 
time Markov chain on {±}^ where given the current configuration a the next configuration a' is obtained by 
choosing a vertex v in V uniformly at random and 

• Letting a' (w) = a (w) for all w =/= v. 

• o~'(v) is assigned the spin + with probability 

expjhy + T,u:(v,u)eE Pu,vO{u)) 

exp(h v + Yl U :( v , u )eE Pu,v<t{u)) + exp(-h Pu,vv(u))' 

We will be interested in the time it takes the dynamics to get close to the distributions (O and (01. The 
mixing time T m i x of the chain is defined as the number of steps needed in order to guarantee that the chain, 
starting from an arbitrary state, is within total variation distance l/2e from the stationary distribution. The 
mixing time has property that for any integer k and initial configuration x, 

\\P{Xkr mi * = -\ X = x)- P(-)||tv < e- k . (4) 



It is well known that Gibbs sampling is a reversible Markov chain with stationary distribution P. Let 
1 = Ai > A2 > . . . > A m > — 1 denote the eigenvalues of the transition matrix of Gibbs sampling. The 
spectral gap is denoted by min{l — A2, 1 — |A m |} and the relaxation time r is the inverse of the spectral gap. 
The relaxation time can be given in terms of the Dirichlet form of the Markov chain by the equation 

T = 8up \ m \(ft \ 7TVV2 : 1^ p ( (J )f( (J ) ( 5 ) 
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where / : {±} y — >• R is any function on configurations, Q(a,r) = P(a)P(a —> r) and P(cr — >• r) is 
transition probability from a to r. We use the result that for reversible Markov chains the relaxation time 
satisfies 



where T m i x is the mixing time (see e.g. HI) and so by bounding the relaxation time we can bound the mixing 
time up to a polynomial factor. 

While our results are given for the discrete time Gibbs Sampler described above, it will at times be con- 
venient to consider the continuous time version of the model. Here sites are updated at rate 1 by independent 
Poisson clocks. The two chains are closely related, the relaxation time of the continuous time Markov chain is 
n times the relaxation time of the discrete chain (see e.g. [T)). 

1.2.3 Erdos-Renyi Random Graphs and Other Models of graphs 

The Erdos-Renyi random graph G(n,p), is the graph with n vertices V and random edges E where each 
potential edge (it, v) G V x V is chosen independently with probability p. We take p = d/n where d > 1 is 
fixed. In the cased < 1, it is well known that with high probability all components of G(n,p) are of logarithmic 
size which implies immediately that the dynamics mix in polynomial time for all f3. A random d-regular graph 
Q(n, d) is a graph uniformly chosen from all d-regular graphs on n labeled vertices. 

Asymptotically the local neighbourhoods of G(n, d/n) and G(n, d) are trees. In the later case it is a tree 
where every node has exactly d— 1 offsprings (except for the root which has d off-springs). In the former case it 
is essentially a Galton- Watson branching process with offspring distribution which is essentially Poisson with 
mean d — 1. Recall that the tree associated with a Galton- Watson branching process with offspring distribution 
X is a random rooted tree defined as follows: for every vertex in the tree its number of offspring vertices is 
independent with distribution X. 

1.3 A General Theorem 

Theorems Q] and [2] are both proved as special cases of the following theorem which may be of independent 
interest. For a graph G = (V, E) and vertex v G V, we write B(v, R) for the ball of radius R around v, i.e., 
the set of all vertices that are of distance at most R from v. We write S(v, R) = B(v, R) \ B(v, R — l) for the 
sphere of radius R around v. 

Theorem 3 Let G be a graph on n > 2 vertices such that there exist constants R,T,3C > 1 such that the 
following three conditions holds for all v G V : 

• Volume: The volume of the ball B(v, R) satisfies \B(v, R)\ < X. 

• Local Mixing: For any configuration v on S(v, R) the continuous time mixing time of the Gibbs sampler 
on B(v, R — 1) with fixed boundary condition n is bounded above by T. 

• Spatial Mixing: For each vertex u G S(v,R) define 




(6) 



a 



sup P(a v = +|cr A = V + ) - P{<7v = +\<?A = i] ) 

T]+,T]- 



(7) 
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where the supremum is over configurations rj + ,r/ on S(v, R) which differ only at u with 77.+ = +,rj u = 
— . Then 

E a "<\- (8) 

u€S(v,R) 

Then starting from the all + and all — configurations in continuous time the monotone coupling couples with 
probability at least g by time T [log 8X] (3 + log 2 n). 

It follows that the mixing time of the Gibbs sampler in continuous time satisfies 

Tmix < T flog 8*1 (3 + log 2 n) 

while the spectral gap satisfies 

Gap > (T[log8Xl) _1 log2. 

We will write Vol(i?, X) for the statement that \B{v, R)\ < X for all v £ V, write SM(i?) for the statement 
that dHJ holds for all v G V and write LM(R, T) for the statement that the continuous time mixing time of the 
Gibbs sampler on B(v, R — 1) is bounded above by T for any fixed boundary condition 77 . Using this notation 
the theorem states that: 

Vol(i?, X) and SM(R) and LM(R, T) r mix < T{log8X~\ (3 + log 2 n) . (9) 

In the conclusion section of the paper we state a much more general version of Theorem [3] which applies 
to general monotone Gibbs distributions and allows the sets B(v,R) to be arbitrary sets containing v (where 
S(v, R) is replaced by the inner vertex boundary of the B(v, R)). We note that the implication proven here for 
monotone systems showing 

Spatial Mixing => Temporal Mixing 

is stronger than that established in previous work l27l [181 [5] |7) where it is shown that Strong Spatial Mixing 
implies Temporal Mixing for graphs with sub-exponential growth (Strong Spatial Mixing says that the quantity 
a u decays exponentially in the distance between u and v). In particular, Theorem [3] applies also to graphs 
with exponential growth and for a very general choice of blocks B(v,R). Both Theorems Q] and [2] deal with 
expanding graphs where Theorem[3]is needed. 

A different way to look at our result is as a strengthening of the Dobrushin-Shlosman condition [6]. Stated 
in its strongest form in OTl Theorem 2.5, it says that rapid mixing occurs if the effect on the spin at a vertex v 
of disagreements on the boundary of blocks containing v is small - averaged over all blocks containing 7; - then 
the model has uniqueness and the block dynamics mixes rapidly. Theorem|4]requires only that for each vertex 
there exists a block such that the boundary effect is small. This is critical in expanders and random graphs 
where the boundary of a block is proportional to its volume. 

1.4 Proofs Sketch 

We briefly discuss the main ideas in our proofs of Theorems 131 1 1 andl2l 
1.4.1 Theorem|3]and Censoring 

The proof of Theorem [3] is based on considering the monotone coupling of the continuous time dynamics 
starting with all + and all — states and showing that there exists a constant s such that at time ks, for all 
vertices v, the probability that the two measures have not coupled at v is at most 2~ fc . 
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In order to prove such a claim by induction, it is useful to censor the dynamics from time ks onwards by 
not performing any updates outside a ball of radius R around v. Recent results of Peres and Winkler show that 
doing so will result in a larger disagreement probability at v than without any censoring. 

For the censored dynamics we use the triangle inequality and compare the marginal probability at v for the 
two measures by comparing each distribution to the stationary distribution at v given the boundary condition 
and then comparing the two stationary distributions at v given the two boundary conditions. 

By using LM(R, T) and running the censored dynamics for T [log 8X] time we can ensure that the error of 
the first type contributes at most 2/ (8X) in case where the two boundary conditions are different and therefore 
at most 2/ (8X) times the expected number of disagreements at the boundary which is bounded by 2~ fc ~ 2 by 
induction. By using SM(R) and the induction hypothesis we obtain that the expected discrepancy between the 
distributions at a v given the two different boundary conditions is at most 2~ fe ~ 2 . Combining the two estimates 
yields the desired result. As this gives an exponential rate of decay in the expected discrepancy it establishes a 
constant lower bound on the spectral gap. 

The proofs of Theorems[T|and[2]follows from (O by establishing bounds on Vol, SM and LM. 

1.4.2 Bounding the Volume 

The easiest step in both Theorems[T]and[2]is to establish Vol(i?, X). For graphs of degree at most d, the volume 
grows as 0((d— l) R ) and using arguments from 11241 one can show that if R = (log log n) n then for G(n, d/n) 
one can take X of order d R log n. 

1.4.3 Spatial Mixing Bounds 

Establishing Spatial mixing bounds relies on the fact that for trees without external fields - this is a standard 
calculation. The presence of external fields can be dealt with using a Lemma from |2] which shows that the 
for Ising model on trees, the difference in magnetization is maximized when there are no external fields. A 
crucial tool which allows us to obtain results for non-tree graphs is the Weitz tree QUI . This tree allows us to 
write magnetization ratios for the Ising model on general graphs using a related model on the tree. In |24| it 
was shown that the Weitz tree can be used to construct an efficient algorithm different than Gibbs Sampling for 
sampling Ising configurations under the conditions of Theorems Q] and [2] (the running time of the algorithm is 
n i+c(fi) compared to C{(3)n log n established here). 

1.4.4 Local Mixing Bounds 

In order to derive local mixing bounds we generalize results from [ 2 1 on the mixing times in terms of cut-width 
to deal with arbitrary external fields. Further, for the case of Erdos-Renyi random graphs and R = (log log n) 2 
we show that with high probability the cut width is of order log n/ log log n. 



2 Proofs 

In this section we prove Theorems [3] Q] and [2] while the verification of the Vol, SM and LM conditions is 
deferred to the following sections. We begin by recalling the notion of monotone coupling and the result by 
Peres-Winkler on censoring. We then proceed with the proof of the theorems. 
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2.1 Monotone Coupling 



For two configurations X,Y G { — , +} v we let X > Y denote that X is greater than or equal to Y pointwise. 
When all the interactions are positive, it is well known that the Ising model is a monotone system under 
this partial ordering, that is if X > Y then, 

P (cr v = + \a v \{ v } = X v \[ v y) > P (cf v = +\(?v\{v} = Y V\{v}) ■ 

As it is a monotone system there exists a coupling of Markov chains {Xf} a . e /_ )+ \v such that marginally 
each has the law of the Gibbs Sampler with starting configurations X^ = x and further that if x > y then for 
all t, Xf > X\. This is referred to as the monotone coupling and can be constructed as follows: let Vi, . . . 
be a random sequence of vertices updated by the Gibbs Sampler and associate with them iid random variables 
Ui, . . . distributed as U[0, 1] which determine how the site is updated. At the ith update the site Uj is updated 
to + if 

< expjhy + Eu:K M )e-E Pu,v(t(u)) 

exp(hv + J2u:(v,u)&e Pu,v(j{u)) + exp(-h v - J2u-.(v,u)eE Pu,vo{u)) 

and to — otherwise. It is well known that such transitions preserve the partial ordering which guarantees that if 
x > y then Xf > Xf by the monotonicity of the system. In particular this implies that it is enough to bounded 
the time taken to couple from the all + and all — starting configurations. 

2.2 Censoring 

In general it is believed that doing more updates should lead to a more mixed state. For the ferromagnetic Ising 
model and other monotone systems this intuition was proved by Peres and Winkler. They showed that starting 
from the all + (or all — ) configurations adding updates only improves mixing. More formally they proved the 
following proposition. 

Proposition 1 Let u\, . . . , u m be a sequence of vertices and leti\, . . . ,iibe a strictly increasing subsequence 
ofl,...,m. Let X + ( resp. X~) be a random configuration constructed by starting from the all + ( resp. all 
—) configuration and running Gibbs updates sequentially on Ui, ... , u m . Similarly let Y + (resp. Y~) be a 
random configuration constructed by starting from the all + ( resp. all —) configuration and running Gibbs 
updates sequentially on the vertices , . . . , Ui m . Then 

Y~ 4 X- 4 X+ =4 Y+. 

where A =4 B denotes that A stochastically dominates B in the partial ordering of configurations. 

This result in fact holds for random sequences of vertices of random length and random subsequences 
provided the choice of sequence is independent of the choices that the Gibbs sampler makes. The result 
remains unpublished but its proof can be found in ll26l . 

2.3 Proof of Theorem g] 

Proof: [Theorem |j] 

Let X^XfT, denote the Gibbs sampler on G started respectively from the all + and — configurations, 
coupled using the monotone coupling described in Section 12.11 Fix some vertex v G G. We will define 
two new censored chains Z f + and starting from the all + and all — configurations respectively. Take 
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S > to be some arbitrary constant. Until time S we set both Zf and Z^ to be simply equal to X^ and 
Xf7 respectively. After time S all updates outside of B(v, R — 1) are censored, that is Z^~ and Z^ remain 
unchanged onV\ B(v, R — 1) after time S but inside B(v, R — 1) share all the same updates with X t + and 

In particular this means that for Z t + and Z t ~ the spins on S(v, R) are fixed after time S. By monotonicity 
of the updates we have Z 4 + > £> t ~~ and > X^ for all i. After time S the censored processes are simply the 
Gibbs sampler on B(v, R—l) with boundary condition Xg (S(v, R)). By assumption we have that the mixing 
time of this dynamics is bounded above by T and by equation if t = T [log 8X] then 

\P {Z+ +t {v) = + | Js) -P(a v =+\ as^R) = X+(S(v, R))) \ < ^. (10) 
and similarly for Z~ where denotes the sigma-algebra generated by the updates up to time S. Now 

P (Z+ +t (v) + Z s+t (v) \J r s)=P (Zt +t (v) =+\F s )-P (Z- +t (v) = + | T S ) 

= I (X+(B(v, R)) ? X s (B(v, R))) [P (Z+ +t (v) = + | F s ) - P {Z s+t (v) = + \ F s )] , (11) 

since if Xj(B(v, R)) = X s (B(v, R)) then the censored processes remain equal within B(v,R) for all time 
as they receive the same updates. Now we split up the right hand side as follows so by the triangle inequality 

I {X+(B(v, R)) ? X-{B{v, R))) [P {Z+ +t (v) = + \ F s ) - P {Z s+t (v) = + \ F s )] 
< I (X+(B(v, R)) ? X s (B(v, R))) \P {Z+ +t (v) = + \ F s ) - P (a v = + \ a s(vM) = X+(S(v, R))) \ 
+ \P(a v = +\ a s(v>R) = X+(S(v,R))) ~P(a v = + \ a s(v , R) = Xg(S(v,R)))\ 
+ \P(Zg +t (v) = +\ Ts)-P(<T v =+\cr s{VtR) =Xg(S(v,R)))\ . (12) 

Now 

EI (X+(B(v, R)) ? X s (B(v, R))) \P (Z+ +t (v) = + \ F s ) - P (<r v = + \ a s(v , R) = X+(S(v, R))) \ 
< -^EI (X+(B(v, R)) ± X s (B(v, R))) 



8X 

u£B(v,R) 

< - max. P(Xt(u) ^ Xa (u)) (13) 

where the second inequality follows from equation ( TTOb and the final inequality follows from the volume as- 
sumption. Similarly for Z~ . 

If r] + > r\~ are two configurations on S(v, R) which differ only on the set U C S(v, R) then by changing 
the vertices one at a time by the spatial mixing condition we have that 

P(<J V = +\(Ta = r/+) - P((7 V = +|cr A = V~) < a u 

ueu 

It follows that 

E\P(a v = +\ a s(vM) = X+(S(v,R))) - P (a v = + \ a s ^ R) = Xg(S(v, R))) \ 

<E J2 a u l(X+(u)^Xg(u))<^maxP(X+(u)^X s (u)). (14) 

u£B(v,R) 
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Combining equations ( fTTT ). ( fT2b . dT3l > and ( fT4t we have that 

P (2+ +t («) ^ Zs+t(«)) < \ ( x s( u ) * x s( u )) ■ 

By the Censoring Lemma we have that Z t + X + ^ X~ !>= Z t ~ and so, 

P (X+ +t (v) ± X- +t (v)) < P (Z+ +t (v) ? Z- +t {v)) 
Combining the previous two equations and taking a maximum over v we have that 

maxP {X+ +t (u) ? Xg +t (u)) < \ maxP (X+(u) + Xg(u)) . (15) 

Now S is arbitrary so iterate equation $15[ to get that 

max P ( X+ n 7 < (u) ^ XT, , n n , < 2~ 3 ~ riog2 "1 < — . 
„ e y V *(3+riog 2 ™l) v ; ~ t(3+ri°g 2 V ~~ ~~ 2en 

Taking a union bound over all u £ we have that 



F ( x t(3+n°g2«i) ^ x t(3+rio g2 «i)) - 2e 

and so the mixing time is bounded above by T[log 8X] (3 + log 2 n). Since the expected number of disagree- 
ments decays exponentially with a rate of at least t^ 1 log 2, i.e. 



me~ st log2 



P#{ U e^:X + («)^X"N} <2r 
it follows by standard results (see e.g. [4Q that the spectral gap is bounded below by t^ 1 log 2. ■ 

2.4 Proofs of Theorems UandU] 

We now prove Theorems Theorems Q] and [2] except for the result for infinite graphs which will be proven in 
Section|6] Theorem[T]follows from (0 and the following lemmas. 

Lemma 1 Let G — (V, E) be a graph of maximal degree d. Then Vol(P, X) holds with 

R 



Lemma 2 Let G = (V, E) be a graph of maximal degree d and consider the ferromagnetic Ising model on G 
with arbitrary external fields. Then LM(P, T) holds with 



T = 80d 3 X 3 e 5 ^ x+1 \ X = 1 + dJ2(d - I)'" 1 . 



Lemma 3 Let G — (V, E) be a graph with maximum degree d and let v G V. Suppose that (d— 1) tanh /3 < 1. 
Let R be an integer large enough so that 

d{d-\) R -H & nh R p 1 
1 - (d- l)tanh/3 "4' ' 

Then SM(P) holds. 
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We note that Lemma Q] is trivial. As for Lemma [2]- it is easy to prove a bound with a finite T depending 
on R only assuming all external fields are bounded. We provide an analysis with a tighter bound which applies 
also when the external fields are not bounded. The proof is based on cut-width. The main step is proving 
Lemma|3] which uses recursions on trees, a comparison argument from |2| and the Weitz tree. 

The upper bound in Theorem|2]follows from (0 and the following lemmas. 

Lemma 4 Let G be a random graph distributed as G(n,d/n). Then Vol(i?,X) holds with high probability 
over G with 

R = (log log n) 2 , X = d R \ogn. 

Lemma 5 Let G be a random graph distributed as G(n, d/n) where d is fixed. There exists a constant C(d) 
such that for LM(i?, T) holds with high probability over G with 

i?=(loglogn) 2 , T = e 10 ^^). 

Lemma 6 Let G be a random graph distributed as G(n,d/n) where d is fixed and dtanh/3 < 1. Then 
SM(i?, T) holds with high probability over G with R — (log log n) 2 . 

The main challenge in extending the proof from bounded degree graphs to G(n,d/n) is obtaining a good 
enough control on the local geometry of the graph. In particular, we obtain very tight tail estimates on the cut- 
width of a Galton- Watson tree with Poisson offspring distribution of (log log n) 2 levels. A lower bound on the 
mixing time of 7i 1+r2 (i°gi°g» ) was shown in ll24l by analyzing large star subgraphs on G(n, d/n). Recall that a 
star is a graph which is a rooted tree with depth 1 and that an Erdos-Renyi random graph with high probability 
there are stars with degree ^( ^"fj 1 „ )■ 



3 Volume Growth 

We begin with verification of the Volume Growth condition. Since Lemma Q] is trivial, this section will be 
devoted to the proof of Lemma|4]and other geometric properties of random graphs. The reader who is interested 
in the proof of Theorem[T]only may skip the remainder of this section. 

The results stated in the section will require the notion of tree excess. For a graph G we let t(G) denote the 
tree excess of G, i.e., 

t(G) = \E\-\V\ + l. 
Note that the second item of the following lemma implies the statement of Lemma|4] 

Lemma 7 Let d be fixed and let G be a random graph distributed as G(n, d/n). The following hold with high 
probability over G when R — (log log n) 2 for all v 6 G: 

• B{v, R) has a spanning tree T(y, R) which is stochastically dominated by a Galton-Watson branching 
process with offspring distribution Poissonf d). 

• The tree excess satisfies t(v, R) < 1. 

• The volume ofB(v, R) is bounded by 

\B(v,R)\ < d R logn. 
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Proof: We construct a spanning tree T(v, R) of B(v, R) in a standard manner. Take some arbitrary ordering of 
the vertices of G. Start with the vertex v and attach it to all its neighbors in G. Now take the minimal vertex in 
S(v, 1), according to the ordering, and attach it to all its neighbors in G which are not already in the tree. Repeat 
this for each of the vertices in S(v, 1) in increasing order. Repeat this for S(v, 2) and continue until S(v, R—l) 
which completes T(v, R). By construction this is a spanning tree for B(v, R). The construction can be viewed 
as a breadth first search of B(v,R) starting from v and exploring according to the vertex ordering. By a 
standard argument T(v, R) is stochastically dominated by a Galton- Watson branching process with offspring 
distribution Poisson(cf) with R levels thus proving the first statement. 

Since the volume of B(v, R) equals the volume of T(v, R) it suffices to bound the later. For this we use a 
variant of an argument from [24|. We let Z(r) denote the distribution of the volume of a Galton Watson tree of 
depth r with off spring distribution N where N is Poisson(ef). We claim that for all t > it holds that 

sup E[exp(tZ r d~ r )} < oo. (17) 

r 

Writing s = s(t) for the value of the supremum, if follows from Markov' v inequality that, 

s > P[Z R > R d \ogn] exp(ilogn) 

and so 

P[Zr > R d logn] < sexp(-ilogn), 

which is smaller than o(l/n) if t > 1. This implies that B(v, R) < R d log n for all v by a union bound and 
proves the second statement of the lemma. 

For (Y7\ . let Ni be independent copies of N and note that 

Eexp(tZ r+1 ) = Eexp^td-^Ni) = E[E[exp(^td- {r+1) N t \Z r \] (18) 

= E[(E[exp(td- r+1 N)]) Zr ] = Eexp(Z r log(E exp(td- ir+1) N))), 

which recursively relates the exponential moments of Z r+ i to the exponential moments of Z r . In particular 
since all the exponential moments of Z-± exist, E exp(tZ r ) < oo for all t and r. When < s < 1 

£exp(siV) = V < 1 + s d+s 2 Y^f- < exp(sd(l + as)) (19) 

i=0 i=2 

provided a is sufficiently large. Now fix a t and let t r — t exp(2ai YlTLr+i d~ 1 )- F° r some sufficiently large 
j we have that cxp(2ai Y^L r +i ^ ^ an< ^ ^rd < 1 for all r > j. Then for r > j by equations ( PT8l > 
and (O, 

Eexp(t r+1 Z r+1 d- {r+1) ) = E exp(log{E exp(t r+1 d-<- r+1) N t ))Z r ) 

< Eexp(t r+1 {l + at r+ id- {r+1) )Z r d- r ) 

< Eexp(t r+1 {l + 2atd- {r+1) )Z r d~ r ) 

< Ecxp(t r Z r d^ r ) 

and so 

sup E exp(tZ r d~ r ) < sup E exp(t r Z r d~ r ) = E expftjZjd^ 3 ) < oo 

r>j 

which completes the proof of (fTTI i. 

It remains to bound the tree excess. In the construction of T(v, R) there may be some edges in B(v, R) 
which are not explored and so are not in T(v, R). Each edge between u, w G V(v, R) which is not explored in 
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the construction of T(v, R) is present in B(v, R) independently with probability d/n. There are at most d 2R 
unexplored edges and 

P(Binomial(d 2iJ , d/n) > 1] < d iR {d/nf < n - 2 +°^\ 
for any fixed d. So by a union bound with high probability we have that t(v, R) < 1 for all v. ■ 



4 Local Mixing 

In this section we prove Lemma|2]and Lemma|5] The proof that the local mixing condition holds for graphs of 
bounded degree, bounded volume and bounded external field is standard. Indeed the reader who is interested 
in Theorem[T]for models with bounded external fields may skip this section. 

4.1 Cut- Width Bounds 

The main tool in bounding the mixing time will be the notion of cut-width used in [2 1. Recall that the cut-width 
of a finite graph G = (V, E) 

min max {tv-n : j <i\ Y. {v„t^ : j > i\ PI E\ 

where the minimum is taken over all permutations of the labels of the vertices V\ , . . . , v n in V. 

We will prove the following result which generalizes the results of [2 1 to the case with boundary conditions. 
The proof follow the ones given in J2] and |[T6l . 

Lemma 8 Consider the Ising model on G with interaction strengths bounded by f3, arbitrary external field, 
cut-width £, and maximal degree d. Then the relaxation time of the discrete time Gibbs sampler is at most 

n 2 e 4/3(£+d)_ 

Proof: We follow the notation of lfT2l . Fix an ordering "<" of the vertices in V which achieves the cut-width. 
Define a canonical path 7(17, 77) between two configurations a, r\ as follows: let v\ < V2 < ■ ■ ■ < vt be the 
vertices on which a and r] differ. The fcth configuration in the path rj = a^, a' 1 ', . . . , is defined by 
ai^ = a v for v < Vk and = rj v for v > Vk- Then by the method of canonical paths (see e.g. lfTTl[T6l ) the 
relaxation time is bounded by 

PMPM 

where the supremum is over all pairs of configurations e = (x, y) which differ at a single vertex and where 
e G 7(0", -q) denotes that x and y are consecutive configurations in the canonical path j(a, rf) and Q((x, y)) = 
P(x)P(x -> y). 

Let e = (x, y) be a pair of configurations which differ only at v. For a pair of configurations cr, r\ let 
ip e (a,rj) denote the configuration which is given by (p e (a,rj) v i — r] v > for v 1 < v and <p e {o~,rj) v i — a v > for 
v' > v. In particular note that <p e (<J, rj) v = a v = y v . Then a simple consequence of the labeling gives that 

P{<r)P(ri) < P(u)P(^ e (<7,v))e i£W 
and a crude bound on the transition probabilities gives that 

I gh-uUv-d/3 

P(X —>•?/)> T -77, i 7-777 • 
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Then 

(T,r/:e£7(er,?7) er,?7:e£7(<7,?7) 

< rce 4 ^(l + e -2W^) £ p(¥ , e((J) ^ 

CT,r7:eG7((T,77) 

The labeling is constructed such that for each e the map (p e is injective and as noted above we have that 

9? e (cr, r]) v = a v = y v and so 

v-^ v-^ e hy v +df) 

22 P&e&V)) = 22 ~ e hy„+dfi + e -hy v -df) = 1 + e -2h»„-2^ 

and hence 

1 i „-2h v yv+2dl3 -i , „-2/i„er„„2d^ 

— ^ i g — 2her„ — 2d/3 J i g — 2/icr„ g— 2ci/3 — 

as required. ■ 

We now need to establish a bound to relate the relaxation time to the mixing time. While we would like to 
apply equation © directly to Lemma [8] if the external fields go to infinity the right hand side of equation (O 
also goes to infinity. So that our results holds for any external field we establish the following lemma. 

Lemma 9 Consider the Ising model on G with interaction strengths bounded by ft, cut-width £, arbitrary 
external field and maximal degree d. Then the mixing time of the Gibbs sampler satisfies, 

T mlx < 80n 3 e 5 ^ £+d l 

Proof: Define h — 3 logrt + 6/3£ + Adf3 + 10 and let U denote the set of vertices U = {v € V : \K\ > h}. 
These are the set of vertices with external fields so strong that it is highly unlikely that they are updated to a 
value other than sign(/i„). Let G denote the graph induced by the vertex set V = V \ U, and let P denote the 
Ising model with the same interaction strengths /3 UV but with modified external field 

h v = h v + ^ Puv sign(/i„). 

u£U:(u,v)£E 

This is of course just the original Ising model restricted to V with external field given by a u = sign(h u ) for 
u G U. We now analyze the continuous time Gibbs sampler of P. By Lemma|8]its relaxation time satisfies 

f < ne^ £ +V 

since restricting to G can only decrease the cut-width and maximum degree and since the discrete and contin- 
uous relaxation times differ by a factor of n. To invoke (|6]i we bound min CT P(cr). By our construction we have 
that 

max \ h v \ < h + d/3. 
veV 



Now 

min H(a) = min ) (3 U v a(v)a(u) + } h v a(v) > —n(2d(3 + h) 
1 ' {v,u}eE vev 



and similarly max^ H(a) < n(2d/3 + h). Now the normalizing constant Z satisfies 

Z = V exp(ff (<r)) < 2" exp(n(2d/3 + h)) 
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so finally 



min P(a) > — P v > 2~ n exp(-n(4d/3 + 2h)). 



By equation © this implies that the mixing time of the continuous time Gibbs sampler on P satisfies 

i log(mm Pia)- 1 )] < ne 4 ^ £+ ^ (\ + i n (log 2 + 2d/3 + h) \ . 

We set T = 8n 2 he 4 ^ £+d ^ > 4f mix . 

We now return to the continuous time dynamics on all G. Let A denote the event that every vertex in u G U 
is updated at least once before time T. The probability that a vertex u is updated by time T is 1 — e~ T and so 
by a union bound 

P{A) > 1 - ne- T > 1 - ne- h > 1 - e~ 10 . 

Let B be the event that for every vertex u E U every update up to time 2T updates the spin to sign(/i„). For a 
single vertex u G U and any configuration er when u is updated, 

P (u is updated to - sign^)) < -^—^^ < e~ 2h ^ (20) 

The number of updates in U up to time 2T is distributed as a Poisson random variable with mean 2T\U\ so 

P(B) > P(Po(2Tne- 2h+2df) ) = 0) 

> 1 - 2Tne- 2h+2df3 

> 1 - 8n 3- he m£+d)-2h+2dp 

= 1 - 8he-~ h - w 

> 1 - Be" 10 

where the last inequality follows from the fact that e x > x. 

Let X t denote the Gibbs sampler with respect to P and let Y t be its restriction to V. Conditioned on A and 
B by time T every vertex in U has been updated and it has been updated to sign(/i u ) and remains with this spin 
until time 2T. For T < t < 2T let Yt denote the Gibbs sampler on V with respect to P with initial condition 
Yt = Xt(V). From time T to 2T couple X t and Y t with the same updates (that is inside V the same choice 
of {vi} and {C/i}in the notation of Section l2~H . Then conditioned on A and B we have that Y t = X t (V) for 
T < t < 2T. 

We can now use our bound on the mixing time of the Gibbs sampler with respect to P. Since T > 4f TO j X 
by equation (0]) we have that, 

\\P(Y 2T = -)-P(-)\\ TV <e-\ (21) 
Under the stationary measure P it follows from equation d20l > that for any u G U, 

P(a tt = sign(^))>l-e 2 l^l- M ' 3 

and hence by a union bound 

P (<r u = sign(/i u ), Vu G U) > 1 - ne 2 ^ M/! . (22) 

and so 

||P (a G • | c u = sign(^), Vu G U) - P(a G -)I|tv < ne 2 ^ 2 ^. 
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Since the projection of P onto V conditioning on a u = sign(/i u ) for all u G U is simply P it follows that 

\\P(X 2T = ■)- ^OIItv < P(A C ) + P{B C ) + ||P (a G • | a u = sign(^), Vtt G J7) — P(a G -)||tv 

+ ||P(12tg-)-^g-)IItv 

<9e- 10 + ne 2/l - 2 ^ + e- 4 
1 

< — 

" 2e 

which established 2T as an upper bound on the mixing time T m ix- By a crude bound h < lOne 13 ^^ which 
establishes 

r mlx <2T< 8n 2 he 4 ^ £+d ^ < 80n 3 e 5 ^ £+d ) 

as required. 
■ 

4.2 Proof of Local Mixing for Graphs of Bounded Degree 

We can now prove Lemma [2] 

Proof: The proof follows immediately from Lemma [9] applied to the balls B(v, R) noting that £ is always 
smaller than the number of vertices in the graph which is bounded by X. ■ 

4.3 Cut-width in Random Graphs and Galton Watson Trees 

The main result we prove in this section is the following. 

Lemma 10 For every d there exists a constant C'(d) such that the following hold. Let T be the tree given by 
the first £ levels of a Galton-Watson branching process tree with Poisson(d) offspring distribution. Then £{T), 
the cut-width ofT, is stochastically dominated by the distribution C'i + Po{d). 

Using this result it is not hard to prove the upper bound on the local mixing of Lemma[5] 

Proof: We first note that by Lemma|7]with high probability for all v, the tree excess of the ball B(v, R) is at 
most one. This implies that the cut- width of B(v, R) is at most 1 more than the cut-width of the spanning tree 
T(v, R) of B(v, R) whose distribution is dominates by a Galton-Watson tree with Poisson offspring distribution 
with mean d. We thus conclude by Lemma [10] that with high probability for all v 6 V the distribution of the 
cut-width of B(v, R) is bounded by C'R + Po(d). Since the probability that Po(d) exceeds clogn/ loglogn 
for large enough c is of order n~ 2 , we obtain by a union bound that with high probability for all v it holds 
that B(v, R) has a cut-width of at most (c + C) log n/ log log n. Similarly with high probability the maximal 
degree in G is of order log n/ log log n. Recalling that X is at most d R log n and applying Lemma [9] yields the 
required result. ■ 

The proof of LemmafTOlfollows by induction from the following two lemmas. 

Lemma 11 Let T be a tree rooted at p with degree m and letTi, . . . , Tfe be the subtrees connected to the root. 
Then the cut-width ofT satisfies, 

£{T) < max£ (T t ) + k + l- i. 
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Proof: For each subgraph Tj let , . . . , be a sequence on vertices which achieves the cut-width £(Tj). 
Concatenate these sequences as 

(1) (1) (2) (fe) 

p 1 U 1 , . . . , U\ Vl i , U\ '•••' M |V fc | 

which can easily be seen to achieve the bound max, £{T{) + k + 1 — i. ■ 

For a collection of random variables Y\ , . . . , Yy. the order statistics is defined as the permutation of the 
values into increasing order such that Ym < . . . < Y(k)- 

Lemma 12 Let X ~ Po(d) and let Y\, . . . , Yx be an iid sequence distributed as Po(d). There exists C(d) 
such that 

W = X + max Yd) — i 

l<i<X y ' 

is stochastically dominated by C + Po(d). 

Proof: The probability distribution of the Poisson is given by P(Po(d) = w) = ^-^ — which decays faster 
than any exponential so 

P(Po(rf) >w) 
P(Po(d) = w) 

as w — > oo. With this fast rate of decay we can choose C = C(d) large enough so that the following hold: 

• That C > 6 is even and for w > ^, 

P(Po(d) > w + 1) < P(Po(d) = io) (23) 

• For all w > 0, 

u; + ^ E2 x P(Po(d) >w + ^)< J_P(Po(d) > w) (24) 

• For all w > 0, 

P(Po(d) > Lf J + Cf < P(Po(d) > u> + ^) (25) 
i , , l 



which can be achieved since ,,, s li^v^ 



For all w > 2, 



(Uf 1+cy-r ^ Jw+Wfi' 



w+ 2 r 2w+ ^ p r {d) - 2 -i^ (26) 



• Forw e {0, 1}, 

P(VK > tu + C) < P (Po(d) > io) (27) 

Observe that for 1 < i < x, 

P(% >w\X = x)<( _ X JP(Po(d) > u;)^-^ 1 < 2 x P(Po(d) > wf- l+1 (28) 
\x % -\- \ J 

since if Y^) > w then there are at least x — i + 1 of the 7's must be greater than or equal to w and there are 
(x-i+i) suc ^ choices of the set. For any y, z > we have that 

P(Po(d)=y)P(Po(d)=z) = — j-=r ) -, , „ <2» + »P(P0(d)=l/ + ^) (29) 
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since ( y + z ) < 2y+ z . 
Fix a w > 2. Then 



P(W >w + C) = P(X + max Y (i) - i > w + C) 

l<i<X y ' 



< P(X > w + —) + p ( x + ™* Y {i) -i>w + C \ X = x)P(X = x) 

X=l 

< P(X = w) + V P(x + max Y«\ - i > w + C I X = x)P(X = i) (30) 

100 ' i<i<x v ' 

x — l 

where the final equality follows from equation (124-b . Now 

V P(x + max Y (i) - i > w + C I X = x)P(X = x) 

a=l 

™ + T X 

< J2P(x + Y {l) - i > w + C \ X = x)P(X = x) 



x—l i—1 

w+% x 

= J2 Y,P(Y(x- j+ i)>w-j + l + C\X = x)P(X = x) 

x=l j=l 

w+% x 

< 2Xp ( p °(d) >w-j + l + CyP(X = x) 

x=l j=l 

= Yl 2Xp (V°(d) >w-j + l + C) j P(X = x) (31) 

3=1 x=j 

where line 3 follows by setting j = x — i + 1 and line 4 follows from equation ((28). We split this sum into 3 
parts. First we have that 

Y 2 x p(Po(d) >w-j + i + cyp(x = x) 
r w+% r 

<-Y 2 x P(Po(d)>w+-)P(X = x) 

x—l 

<^E2 x P(Po(d)>w + j) 
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where the final equality follows from equation d24l i. Second, 

LfJ ™+% 

2 K P(Po(d) >w-j + l + C) j P(X = x) 

3 = % + l *=j 

<Lfj £ 2 x P(Md)>lj\+C)%P(X = x) 

< L |j£2*P(Po(d)>Lfj+C)* 
<[^]E2 x P(Po(d)> W +^) 

< ^Q p ( p °( d ) ^ w ) (33) 
where line 4 follows from the fact that ^ > 3 and equation (f25l and line 5 follows from equation ( l24b . Finally, 

+ £ 



£ £ 2:Ep ( Po ( d ) > w - 3 + 1 + cyp{x = x ) 

j=[f j+l 05=7 



■!« + ■£ 



j=lf J+l Z==J 
™+§ «>+£ 

< E E 2 ™ +i 3 L P(Po(d) > -) L TJp(p (d) = to - x + C)P(Po(d) = x) (34) 

j=|_§ J+l a=j 

where the second line follows since x < w + ^ an d J > |_]f J + 1 ar, d tne third line follows from the fact 
that w — j + 1 + C is greater than both ^ an d w — £ + C + 1 and applying equation (l23l which says that 
P(Po(d) = u; - x + C) > P(Po(d) > u; - x + C + 1). Then 

2 w+ ^P(Po(d) > -)Lf Jp(Po(d) = «; - a;+ C)P(Po(d) = a:) 

J= Lf J+l z=j 

w+% w + % 

< £ E 2<" + ^P(Po(d) > ^-)LfJ2 u ' +c 'P(Po(d) = w + C) 
j=Lf J+i s=j 

< + ^pj 2 2l " + ¥p(Po(d) > ^)Lf Jp(p (d) = W + C) 

where the second line follows from equation d29l ), and the final line follows from equation ( |26b . Combining 
equations d30l ) through ( f35l > we have that for w > 2, 

P(W > to + C) < — P (Po(d) > to) < P (Po(d) > 10) . 
25 

Combining this with equation (|27] i completes the proof. ■ 
We now prove LemmafTol 
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Proof: Take C = C + 1 where C is the constant from Lemma [T2l We prove the result by induction on I, 
When I = a level Galton- Watson branching process tree is just a single vertex which has cut-width so 
the statement is trivially satisfied. When £ > 1 the subtrees attached to the root are independent i — 1 level 
Galton- Watson branching process trees so by the inductive hypothesis, LemmafTTIand Lemma[T2lwe have that 
£ (T) is stochastically dominated by the distribution C'i + Po(d). ■ 



5 Spatial Mixing 

5.1 SAW trees 

Weitz [30 1 developed the Tree of Self Avoiding Walks construction which enables the calculation of marginal 
distributions of a Gibbs measure on a graph by calculating marginal distributions on a specially constructed 
tree. This construction, along with the censoring inequality, will be a major tools in our proof. For a graph G 
and a vertex v we denote the tree of self-avoiding paths from V in G as T saw (G,v). This is the tree of paths in G 
starting from v and not intersecting themselves, except possibly at the terminal vertex of the path. Through this 
construction each vertex in T saw (G, v) can be identified with a vertex in G which gives a natural way to relate 
a subset A C V and a configuration rj A to the corresponding subset and configuration in T saw (G, v) which we 
denote >p(A) c T saw (G, v) and r? v (A) respectively. Furthermore if A, B C V then d(A, B) = d(cp(A), <p(B)). 
Each vertex (edge) of T saw corresponds to a vertex (edge) in T saw (G,v) so Pr, aw is defined by taking the 
corresponding external field and interactions. Then Theorem 3.1 of 1 30 J gives the following result. 

Lemma 13 [Weitz H30V 1 For a graph G and v G G there exists A C T saw and a configuration va on A such 
that for any AcF and configuration tja on A such that, 

Pg(&v = +|^a) = Pr salL ,(cr v = +|cr ¥ ,(A)\ j4 = Vtp(A)\A,<rA = va)- 

The set A is the set of leaves in T saw corresponding to the terminal vertices of paths which return to a vertex 
already visited by the path. The construction of va is described in XWjl . 

5.2 Spatial Correlations on trees 

We consider the effect that conditioning the vertices of a tree has on the marginal distribution of the spin at the 
root. It will be convenient to compare this probability to the Ising model with the same interaction strengths 
UV but no external field (h = 0) which we will denote P. 

Lemma 14 Suppose that T is a tree, P is the Ising model with arbitrary external field (including h u = ±oo 
meaning that a u is set to ±J and < fi UiV < f3 for all (u, v) G E. Let U C A C V, and let rj + , rf~ be two 
configurations on A which differ only on U with rjy = +,rjy = — . Then for all v G V , 

< P(a v = +|a A = V + ) - P(vv = +K = IT) < ^(tanh/?)^). 
Proof: The inequality 

< P(a v = +\a A = r) + ) - P(a v = +\a A = 77-) 

simply follows from the monotonicity of the ferromagnetic Ising model. Now suppose that the set U is a single 
vertex u. Lemma 4.1 of Q implies that for any vertices v,u G T, 

P{a v = +\a u = +)- P(a v = +\a u = -) < P(a v = +\a u = +) - ?(a v = +\a u = -). (36) 



21 



If Uo, U\, . . . , ui are a path of vertices in T then a simple calculation yields that 

k 

P(a Uk = +\a U0 =+)- P(a Uk = +\a Uo = -) = JJ tanh/3 Ui _ lUi < (tanh/3)' £ . (37) 

i=i 

Conditioning is equivalent to setting an infinite external field so equations (f36l > and (f37T > imply that 

P(ff» = +K = - P(«r = +|(7 A = r?-) < (tanh./?)^). (38) 

We now consider a general f7. Let ui, . . . , u\u\ be an arbitrary labeling of the vertices of U. Take a sequence 
of configurations 77°, 77 1 , . . . , //I* 7 ! on A with rp = rf~ and ry u][ — ?] + where consecutive configurations rf~ x 
and rf differ only at ui with 77^. = + and rf^ 1 = — . By equation (l38l we have that 

P(a v = +|a A = - P(<7„ = +\°x = »?*) < (tanh/3)^"-) 

and so 

= +K = ?/ +) - P((j„ = +K - 77-) < ]T(tanh/?) d ("^ 

uG(7 

which completes the proof. 



5.3 Continuous time to discrete time 

Lemma 15 Suppose that in continuous time starting from the all + and all — configurations the Gibbs sampler 
under the monotone coupling couples with probability at least | by time T > 1. Then the Gibbs sampler in 
discrete time under the monotone coupling couples with probability at least 1 — i by time [5Tn] and hence 
has mixing time at most [5Tn] . 

Proof: Let M denote the number of updates of the continuous dynamics up to time T. Then M is distributed 
as a Poisson random variable with mean Tn. For some integer m, the final state of the continuous time Gibbs 
sampler conditioned on M — m is the same as the final state of the discrete Gibbs sampler with m steps. So 
the probability of coupling in the discrete time after m steps is at least | — P(Po(Tn) > m). So if m > 5Tn 
then by Markov's Theorem 

P„Po(Tn) 

P(P0(Tn) > m) < 6 ^ = e Tn(e-l)-5Tn < g -3_ 

Since | — e -3 > 1 ~ 55 the discrete chain couples by time 5Tn with probability at least 1 — Hence the 
mixing time is at most [5Tn] . ■ 

5.4 Proof of Lemma |3] 

We now prove Lemma[3]by applying Lemma[T"3"1and[T4lto a small graph centered at v. 
Proof: 

Let T denote the tree of self avoiding walks on G from v, T saw (G, v). Let ip(S(v, R)) denote the vertices 
in T which correspond to vertices in S(v, R) and for each u G S(v, R) let (p(u) denote the set of vertices in T 
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which correspond to u. Then by LemmafTHand Lemma[T4l 

a u = Sup Pr saw (<7 v = +W V (A)\A = V^A)\A> a A = Va) - PT saw {Vv = +\<r<p(A)\A = V^A)\A> a A = v a) 
97+ ,77 

< tanh^' 1 ") p. (39) 

Applying this bound 

E a ^< E E 

u£S(v,R) u£S(v,R) w£ip(u) 

E tanh d(t, '" ,) 

wev>(S(v,R)) 

< tanh d ^ v ' w) 

w£T:d(w,v)>R 

where the final inequality follows from the fact that d(v, ip(S(v, R))) > m. Now since T has maximum degree 
d for each I there are at most d(d — 1) 1 vertices at distance I from v. It follows that 

E a u < E tanh d{v < w) P 

u£S(v,R) weT:d(w,v)>R 
00 

< E d ( d - tanh^ 

_ d(d - l)^- 1 tanh fl /3 
~ 1 - (d- l)tanh/3 
1 

< - 

" 4 

as required. ■ 

5.5 Proof of Lemma |6] 

We now prove Lemma [6] 
Proof: 

We need to establish the spatial mixing condition. Recall that 

a u = sup P((T V = +|(T A = T] + ) - P((J V = +|cr A = 

and by equation d39l 

a u < E tanhd(V ' W) P- 

w£tp(u) 

Now t(v, R) < 1 with high probability for all u G V by LemmaQso B(v, R) is a tree or unicyclic. Hence 
every u £ S(v,R) appears at most twice in the tree of self-avoiding walks which gives < 2 and 
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d(v , <fi(u)) = R. Thus for all v G V with high probability 

E ««< E E 

ueS(v,R) ueS(v,R)w£<p(u) 

< 2Xtanh fl /3 

= 6(1 - d-^-^dtanh^^logn 
= o(l) 

which establishes the spatial mixing condition. 



6 Infinite Graphs 

Up until this point we have only dealt with finite graphs, however, the Ising model and the Glauber dynamics 
can be defined on infinite graphs as well, (see e.g. [ 14 1). The spatial mixing property of uniqueness says that 
there is a unique Gibbs measure for the interacting particle system; one formulation of this is that for every 
finite set A C V we have that 

limsupsup \\P [a A = ■ | a S (A,R) = v) ~ p (°"A = • I °s(a,r) = v') Htv = 

where S(A, R) = {u 6 V : d(u, A) — R} and 77,77' are configurations on S(A,R). This says that the 
configuration on A is asymptotically independent of the spins a large distance away. In the context of the 
ferromagnetic Ising model this is equivalent to, 

P (a v = + I a S ( v ,a) =+)-P{cr v =+ | a S ( v , R ) =-)—»• (40) 

for all v £ V as R — > 00. Combining Lemmas [T3land[T4lit follows that Condition ([T]) implies uniqueness. This 
was also noted in [?]. 

The following lemma shows that given uniqueness the Glauber dynamics on an infinite graph can locally 
be approximated by the Glauber dynamics of the Ising model on finite graphs. For a fixed finite set U C let 
a* e denote a random configuration according to the stationary distribution of the Ising model on the induced 
subgraph Gi whose vertex set is given by U/i := {u € V : d(u, U) < £}. Let a* e (t) denote the Glauber 
dynamics of this Ising model started from the stationary distribution. 

Lemma 16 Let G be a infinite graph with maximum degree d and suppose for some {0( u ,v)} an d {h u } the 
Ising model has the uniqueness property and let U be a finite subset ofV. With crff(t) defined as above 

(<#(<)), t#(i))-». (Mo), Mi)) 

jointly in distribution as £ — > 00. 

Proof: Fix an e > 0. It is sufficient to show that for some £' we can couple (crjy (0), <jjj (1)) and (<j[/(0), au(l)) 
with probability at least 1 — e when £ > £'. Fix some positive integer m large enough so that 

P(Poison(l) > m) < -ed^lUI -1 . 

By the uniqueness property as £ — > 00 we have that <rjj converges in distribution to au m ■ So for some £' when 
I > £' we can couple initial configurations a* (0) and cr(0) so that crjj (0) and ajj m (0) agree with probability 
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at least 1 — e/2. Now couple the Glauber dynamics by using the same sequence of updates for each chain 
within Ue . 

We now bound the probability that there is disagreement between cr^(l) and <tj/(1) given that cr^f (0) and 
<7u m (0) agree. We will call a sequence m, . . . , Uk of vertices a path if ut and Ui + i are adjacent for each i. An 
update can only create a disagreement at the vertex if a neighboring vertex already has a disagreement. Hence 
a vertex u can only have a disagreement by time t if there is a path of vertices from m, . . . , ii& = u such that 
the vertices in the path are updated by the Glauber dynamics in that order before time 1 and u\ £ U m \ U m -\- 

Hence the event a^f (1) ^- u\j(X) i s dominated by the event that there is a path of updates of vertices 
Mi, ... , u m , updated in that order before time 1 with u m £ U. For each fixed path the probability that those 
vertices are updated in that order is P(Poison(l) > m). There are at most d m \U\ such paths of vertices so by a 
union bound and our choice of m the probability of a disagreement reaching \U\ is at most e/2. It follows that 
we can couple {ajf (0), Uy (1)) and (au(0), o"c/(l)) with probability at least 1 — e which completes the proof. 



We now show how the spectral gap bounds for the finite graph dynamics imply spectral gap bounds for 
infinite graph dynamics. The following lemma completes TheoremQ] 



Lemma 17 Let G be a infinite graph with maximum degree d and suppose for some {P( u ,v)} an d {hu} the 
Ising model has the uniqueness property. Further suppose that for every finite subgraph G' of G the Ising 
model on G' has continuous time spectral gap bounded below by A*. Then the infinite volume dynamics has 
spectral gap bounded below by A*. 



Proof: First we may assume that the graph is connected since the spectral gap is the minimum of the spectral 
gaps of the dynamics projected onto individual components. We will use the characterization of the spectral 
gap that 

Oap=-lo g sup Sg«« 
/ Var/(cr(0)) 

where the supremum is over all square integrable functions / : {+, — } v — > R with Ef = 0. Fix a vertex v 
and for such a function / we define the bounded function : {+, — }- B ('">- R ) — » R by 

f R (a) = E (/(cr) | <T B (y,R)) 

Since every vertex is ultimately in B(v, R) for R sufficiently large, by the L 2 Martingale Convergence Theorem 
//{(cr) converges to f(a) in L 2 and so 

Cov((f R (a(0)),f R (a(l))) = Cov((/(q(0)), /(<r(l))) 
Vax/ fl (<7(0)) Var/(a(0)) 

In particular this means that in the supremum we only need consider bounded functions which are determined 
by a finite number of spins. So suppose that g is such a bounded function depending only on ajj for some finite 
UcV. 

By Lemma [16l we have that (ct^(0),ct^(1)) converges jointly in distribution to (ajj(0), <7t/(l)). Hence 
using our assumption on the spectral gap on finite subgraphs we have that 

A- < ton _ log gM(j^ 

-£^oo s Varg (ct**(0)) B Var ff (cr(0)) 

which establishes A* as a lower bound on the spectral gap. ■ 
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7 Conclusion 



The proof of Theorem [3] naturally extends to more general monotone systems. Moreover instead of censoring 
outside a ball of radius R about a vertex v we could instead look at general well chosen sets v G W v C V. We 
let S v denote the boundary set {u £ V\ W v : d(u, W v ) = 1}. We consider the following setup. There is a spin 
set fl which is ordered with a maximal element + and a minimal element — . The order on £1 naturally extends 
to a partial order on fl v where V is the vertex set of a graph by letting o\ < <72 if and only if <j\ (v) < a^iv) 
for all iigF.A measure P on Vt v is called monotone if for all v G V and all a G £1 

P[a(v) > a\(j(w : w ^ v) = <j{\ > P[a(v) > a\a(w : w ^ v) = 

whenever o\ ~>u%. We may now state a generalization of Theorem[3] 

Theorem 4 Let G be a graph on n > 2 vertices and let P(a) be any monotone Gibbs measure on G. 

Suppose that there exist constants T, X > 1 and for each v G V there is a subset W v C V containing v 
such that the following three conditions hold:: 

• Volume: The volume ofW v satisfies \ W V \ < X. 

• Local Mixing: For any configuration rj on S v the continuous time mixing time of the Gibbs sampler on 
W v with fixed boundary condition rj is bounded above by T. 

• Spatial Mixing: For each vertex u G S v define 

a u = sup d T v (P((T V = -\<Ja = rj 1 ), Pfav = 'Wa = V 2 )) ( 42 ) 

ri+,r/- 

where the supremum is over configurations r) , rj 2 on S v which differ only at u. Then 

u£S v 

Then starting from the all + and all — configurations in continuous time the monotone coupling couples with 
probability at least g by time T [log 8X] (3 + log 2 n). 

It follows that the mixing time of the Gibbs sampler in continuous time satisfies 

Trmx <T [log 8X1 (3 + log 2 n) . 

While Theorem [4] applies to general monotone systems, the use of the Censoring Lemma of Peres and 
Winkler does not allow to extend it to non-monotone systems such as random colourings. A major open 
problem is to relate spatial mixing to temporal mixing in non-monotone settings, for example for the hardcore 
model, antiferromagnetic Ising model, or colouring model. 

7.1 Open Problems 

We showed that Condition (Q]) establishes a uniform lower bound on the spectral gap of the continuous time 
dynamics over all graphs. It would be of interest to establish whether or not this is also true for bounds on the 
Log-Sobolev constant as well. 

As discussed in the introduction our results give rise to the following conjecture concerning non-monotone 
systems. 
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Conjecture 1 The Gibbs sampler for the antiferromagnetic Ising model ( with no external field) is rapidly 
mixing on any graph whose maximum degree d, for any inverse temperature f3 below the uniqueness threshold 
for the Ising model on the d-regular tree. 

Similarly the Gibbs sampler for the hardcore model is rapidly mixing on any graph whose maximum degree 
is dfor any fugacity A below the uniqueness threshold for the hard-core model on the d-regular tree. 

We recall that for both of these models the mixing time on almost all random d-regular bipartite graphs is 
exponential in n the size of the graph beyond the uniqueness threshold fl25l [8] |5) so our conjecture is that 
uniqueness on the tree exactly corresponds to rapid mixing of the Gibbs sampler. A similar conjecture can be 
made with respect to the coloring model. 
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